Adversarial Training with Fast Gradient Projection Method against Synonym Substitution Based Text Attacks

نویسندگان

چکیده

Adversarial training is the most empirically successful approach in improving robustness of deep neural networks for image classification. For text classification, however, existing synonym substitution based adversarial attacks are effective but not very efficient to be incorporated into practical training. Gradient-based attacks, which images, hard implemented due lexical, grammatical and semantic constraints discrete input space. Thereby, we propose a fast attack method called Fast Gradient Projection Method (FGPM) on substitution, about 20 times faster than methods could achieve similar performance. We then incorporate FGPM with defense Training enhanced by Logit pairing (ATFL). Experiments show that ATFL significantly improve model block transferability examples.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Multi-strength Adversarial Training Method to Mitigate Adversarial Attacks

Some recent works revealed that deep neural networks (DNNs) are vulnerable to so-called adversarial attacks where input examples are intentionally perturbed to fool DNNs. In this work, we revisit the DNN training process that includes adversarial examples into the training dataset so as to improve DNN’s resilience to adversarial attacks, namely, adversarial training. Our experiments show that d...

متن کامل

Discriminative Training for Near-Synonym Substitution

Near-synonyms are useful knowledge resources for many natural language applications such as query expansion for information retrieval (IR) and paraphrasing for text generation. However, near-synonyms are not necessarily interchangeable in contexts due to their specific usage and syntactic constraints. Accordingly, it is worth to develop algorithms to verify whether near-synonyms do match the gi...

متن کامل

Generating Text via Adversarial Training

Generative Adversarial Networks (GANs) have achieved great success in generating 1 realistic synthetic real-valued data. However, the discrete output of language model 2 hinders the application of gradient-based GANs. In this paper we propose a generic 3 framework employing Long short-term Memory (LSTM) and convolutional neural 4 network (CNN) for adversarial training to generate realistic text...

متن کامل

Ensemble Adversarial Training: Attacks and Defenses

Machine learning models are vulnerable to adversarial examples, inputs maliciously perturbed to mislead the model. These inputs transfer between models, thus enabling black-box attacks against deployed models. Adversarial training increases robustness to attacks by injecting adversarial examples into training data. Surprisingly, we find that although adversarially trained models exhibit strong ...

متن کامل

Sparsity-based Defense against Adversarial Attacks on Linear Classifiers

Deep neural networks represent the state of the art in machine learning in a growing number of fields, including vision, speech and natural language processing. However, recent work raises important questions about the robustness of such architectures, by showing that it is possible to induce classification errors through tiny, almost imperceptible, perturbations. Vulnerability to such “adversa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i16.17648